Objective: To identify differentially expressed genes, you must first conduct quality control assessments of your experiments. Next, perform statistical diagnostic analyses to detect and correct any experimental biases.


The experiments were performed comparing one cell culture incubated at atmospheric oxygen conditions (call normoxic and labelled using Green dye) and another one incubated in 1% O2 (call hypoxic and labelled using Red dye).


Data loading

library(marray)

The R package Marray offers several functions to:

  • Read GPR files
  • Draw graphical representations of microarray results (foreground and background signals, missing values, MAplots, etc.)
  • Perform the normalization between Red and Green signals.

GPR file

Input : A GPR file with detailed information for each spot on the slide (gene name, Red and Green intensity values, background intensities and other statistics).

# Read the GPR file using the marray package function read.GenePix
rawdata <- read.GenePix(fnames="dataFile1_normAnalysis.gpr",
                        path= "/shared/projects/2420_ens_hts/data/microarrays")
## Reading ...  /shared/projects/2420_ens_hts/data/microarrays/dataFile1_normAnalysis.gpr

Note : This function reads a GPR file and creates objects of class marrayRaw. In these objects, you can find, for instance, vectors with intensity values (rawdata@maRf or rawdata@maGf). These vectors can be manipulated using classical R functions like “summary()”, “hist()”, etc.

Data exploration

# f is for foreground
# Intensity values in red/hypoxic channel
head(rawdata@maRf)
##      /shared/projects/2420_ens_hts/data/microarrays/dataFile1_normAnalysis.gpr
## [1,]                                                                       992
## [2,]                                                                      1907
## [3,]                                                                       559
## [4,]                                                                       645
## [5,]                                                                     32939
## [6,]                                                                       681
# Intensity values in green/normoxic channel
head(rawdata@maGf)
##      /shared/projects/2420_ens_hts/data/microarrays/dataFile1_normAnalysis.gpr
## [1,]                                                                      2561
## [2,]                                                                      2585
## [3,]                                                                      1588
## [4,]                                                                      1604
## [5,]                                                                     43755
## [6,]                                                                      1732

Visualisation of foreground signal over the microarray glass slide

# Red/hypoxic signals
image(rawdata,
      xvar = "maRf",
      main = "Hypoxic signal (with flags)")

## [1] FALSE
## NULL
# Green/normoxic signals
image(rawdata,
      xvar = "maGf",
      main = "Normoxic signal (with flags)")

## [1] FALSE
## NULL

Quality controls

Visualisation of background signals

Visualize background signals in Red/Hypoxic and Green/Normoxic channels. Try to interpret the obtained results. What do you think about the quality of the experiment?


# b is for background
# Red/Hypoxic channel background signals
image(rawdata,
      xvar = "maRb",
      main = "Hypoxic background (with flags)")

## [1] FALSE
## NULL
# Green/Normoxic channel background signals
image(rawdata,
      xvar = "maGb",
      main = "Normoxic background (with flags)")

## [1] FALSE
## NULL

Low quality spots filtering

Low quality spots will be eliminated from further analyses. Intensity values in foreground and background signals will be replaced with the R symbol “NA” (Not Available).

# You work on a copy of the rawdata
rawdataQC <- rawdata

#Background values to NA
rawdataQC@maRb[rawdataQC@maW < 0] = NA
rawdataQC@maGb[rawdataQC@maW < 0] = NA

#Signal values to NA
rawdataQC@maRf[rawdataQC@maW < 0] = NA 
rawdataQC@maGf[rawdataQC@maW < 0] = NA

Background correction

An intuitive approach for background correction consists in subtracting background intensity values (rawdata@maRb and rawdata@maGb) from global foreground intensities (rawdata@maRf and rawdata@maGf). Nevertheless this method is debatable because for low intensities it adds more noise to the data and creates overestimated log(Ratio) values.

Draw a MA plot with background subtracted from foreground values

plot(rawdataQC,legend.func = NULL, ylim=c(-9,9), xlim=c(3,16), main = "MA plot before normalization (background subtraction)")

Draw a MA plot with foreground values only

# You work on a copy of the rawdata
rawdataQCnoBg <- rawdataQC

#Replace all background by 0 as by default background is subtracted from foreground
rawdataQCnoBg@maGb[] = 0
rawdataQCnoBg@maRb[] = 0

plot(rawdataQCnoBg,legend.func = NULL, ylim=c(-9,9), xlim=c(3,16), main = "MA plot before normalization (no background subtraction)")

The following analyses will be performed with no background subtraction.


Normalisation

Comparison of Red/Hypoxic and Green/Normoxic global signals

Draw the MA plot between Red/Hypoxic and Green/Normoxic signals.
  • Is there any experimental bias?
  • What information gives the boxplot representation?
  • What kind of normalization method needs to be applied?


plot(rawdataQCnoBg, main = "MA plot before normalization")

boxplot(rawdataQCnoBg, main = "Boxplot before normalization")

Comparison of different normalization procedures

Normalize the intensity measures between Red/Hypoxic and Green/Normoxic signals.
  • Try 2 different normalization methods “median” and “loess”.
  • Draw the associated MA plot and BoxPlot (after normalization). What are the differences with the graphs obtained before normalization?
  • Draw the log2(R/G) distribution before and after normalization. How do you interpret the results?


rawdataQCnoBgMedian <- maNorm(rawdataQCnoBg, norm = "median", echo = T)
## Normalization method: median.
## Normalizing array 1.
rawdataQCnoBgLoess <- maNorm(rawdataQCnoBg, norm = "loess", echo = T)
## Normalization method: loess.
## Normalizing array 1.

MA plots after normalization

plot(rawdataQCnoBgMedian, legend.func = NULL, main = "norm = Median")

plot(rawdataQCnoBgLoess, legend.func = NULL, main = "norm = Loess")

Boxplots after normalization

boxplot(rawdataQCnoBgMedian, main = "norm = Median")

boxplot(rawdataQCnoBgLoess, main = "norm = Loess")

plot(density(maM(rawdataQCnoBgLoess),na.rm = T),
     lwd = 2, col = 2, main = "Distribution of log(Ratio)")
lines(density(maM(rawdataQCnoBg), na.rm = T), lwd = 2)
abline(v = 0)
legend(x= 0.5, y= 1.2,c("Before normalization","After normalization with loess"), fill = c(1,2))


Reproductibility

#Date
format(Sys.time(), "%d %B, %Y, %H,%M")
## [1] "21 August, 2024, 13,56"
#Packages used
sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-conda-linux-gnu
## Running under: Ubuntu 20.04.6 LTS
## 
## Matrix products: default
## BLAS/LAPACK: /shared/ifbstor1/software/miniconda/envs/r-4.4.1/lib/libopenblasp-r0.3.27.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Europe/Paris
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] marray_1.82.0 limma_3.60.3 
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.36     R6_2.5.1          fastmap_1.2.0     xfun_0.45        
##  [5] cachem_1.1.0      knitr_1.48        htmltools_0.5.8.1 rmarkdown_2.27   
##  [9] lifecycle_1.0.4   cli_3.6.3         sass_0.4.9        jquerylib_0.1.4  
## [13] statmod_1.5.0     compiler_4.4.1    highr_0.11        rstudioapi_0.16.0
## [17] tools_4.4.1       evaluate_0.24.0   bslib_0.7.0       yaml_2.3.9       
## [21] rlang_1.1.4       jsonlite_1.8.8